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(54) Program conversion apparatus for constant reconstructing VLIW processor 


(57) A program conversion apparatus includes: the 
constant division unit 1 2 for specifying instructions in the 
serial assembler code 42 that use large constants which 
cannot be arranged within the operation fields of object 
VLIWs and for dividing the specified instructions into di- 
vided constant use instructions for storing pieces of the 
large constants into the specialized constant buffer 1 07 
of a VLIW processor and divided constant use instruc- 
tions for performing operations using the stored con- 
stants; the dependence graph generation unit 20 for 
generating a dependence graph based on the execution 
order of each instruction in the serial assembler code 
42 after the division process by the constant division unit 
12; and the instruction relocation unit 21 for relocating 
the instructions according to the dependence graph to 
generate parallel assembler code. 
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Description 

[0001] This application is based on application No. 
H9-2351 44 filed in Japan, the content of which is hereby 
incorporated by reference. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

[0002] The present invention relates to a program 
conversion apparatus for generating executable code 
for a VLI W processor by translating, linking, and editing 
a source program written in a high-level language and 
a recording medium. In particular, the invention relates 
to a technique for dividing instructions including con- 
stants in a source program into parts and executing par- 
allel scheduling with the divided instructions. 

2. Related Art 

[0003] VLI W (Very Long I nstruction Word) processors 
include a plurality of operation units which execute a plu- 
rality of operations arranged in each VLIW in parallel. 
VLIWs are generated by program conversion appara- 
tuses, namely compilers, which detect parallelism in 
source programs at an operation level and perform 
scheduling of the source programs. 
[0004] VLIWs are, however, fixed- length instructions 
and therefore are inefficient as code. That is, in many 
cases, it is necessary to insert redundant codes, such 
as no-operation codes ("nop" codes), into VLIWs. VLIW 
processors avoiding the occurrence of redundant areas 
in VLIWs are disclosed by Japanese Patent Applica- 
tions H09-1 59058 and H9-1 59059 of the same applicant 
as this application. 

[0005] Each of these VLIW processors includes a 
specialized constant buffer and a function for executing 
a program, in which a constant included in each instruc- 
tion is extracted as it is or is extracted and is divided into 
several partial digits, and is arranged in different VLIWs. 
In this specification, the term "divided constants" de- 
scribes these divided parts of a constant, or on occa- 
sion, entire constants. Each VLIW processor executes 
this program by accumulating divided constants rn the 
constant buffer (in a digit direction) to reconstruct the 
original constant and using the reconstructed original 
constant as a branch destination or an operand. Note 
that a VLIW processor having this function is hereinafter 
referred to as a "constant reconstructing VLIW proces- 
sor". A compiler for the constant reconstructing VLIW 
processor divides long constants in a program into di- 
vided constants and fills redundant areas in instructions 
with the divided constants, thereby improving the code 
efficiency of the program. 

[0006] However, a compiler has not yet been pro- 
posed which is suitable for the constant reconstructing 
VLIW processor. 


[0007] This compiler needs to divide long constants 
in a program into divided constants and to appropriately 
arrange the divided constants in a plurality of VLIWs. By 
doing so, the compiler generates executable code. This 
s reduces redundant areas in instructions. This function 
needs to ensure that each original constant is correctly 
reconstructed from the divided constants arranged in 
the plurality of VLIWs and is definitely used by the in- 
tended instruction. 

10 

SUMMARY OF THE INVENTION 

[0008] In view of the stated problems, the object of the 
present invention is to provide a compiler used for con- 
is slant reconstructing VLI W processors and to provide ex- 
ecutable code suitable for the constant reconstructing 
VLIW processors. 

[0009] To achieve the above object, the compiler of 
the present invention converts an instruction sequence 
^0 composed of serially arranged instructions Into a VLIW 
sequence for a processor. The compiler includes: a di- 
vision step for dividing each instruction including a con- 
stant in the instruction sequence into a plurality of divid- 
ed instructions; an analysis step for analyzing depend- 
25 ence relations between each instruction in the instruc- 
tion sequence including divided instructions generated 
in the division step according to an execution order of 
each instruction in the Instruction sequence; and a relo- 
cation step for relocating instructions in the instruction 
30 sequence in compliance with the analyzed dependence 
relations to generate VLIWs which are each composed 
of a plurality of instructions that are executable in par- 
allel. 

[0010] With the stated steps, each instruction includ- 
es ing a constant in a source program is divided into at least 
two shorter instructions and parallel scheduling is per- 
formed using the shorter instructions so that a compiler 
suitable for the constant reconstructing VLIW processor 
can be realized. That is, the generation of redundant ar- 
40 eas in VLIWs is suppressed. 

[0011] Here, the division step may include: an Instruc- 
tion size judgement substep for performing an instruc- 
tion size judgement as to whether a size of an instruction 
including a constant is equal to or smaller than a size of 
45 each unit operation field in a VLIW; and a division sub- 
step which, when the size of the instruction including the 
constant is judged to be greater than the size of each 
unit operation field, divides the instruction including the 
constant into a plurality of divided instructions whose 
so sizes are each equal to or smaller than the size of each 
unit operation field. 

[0012] With the stated steps, only instructions whose 
sizes are greater than operation fields of object VLIWs 
are divided and are subjected to the parallel scheduling. 
55 Therefore, even when a source program Includes in- 
structions whose sizes are irrelevant to operation fields 
of object VLIWs. the division process is performed only 
on instructions which should be divided, reducing the 
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compiling time. 

[001 3] Here, in the division substep, the instruction in- 
cluding the constant may be divided into one or more 
instructions for storing the constant into a storage buffer 
of the processor and an instruction for using the stored 
constant. 

[0014] With the stated process, all constants in in- 
structions are stored in a constant buffer. As a result, 
instructions including constants do not need to include 
the constants as operands so that a compiler suitable 
for VLIWs having small operation fields for specifying 
only operation codes can be realized. 
[001 5] Here, in the division substep. the instruction in- 
cluding the constant may be divided into one or more 
instructions for respectively storing one or more divided 
constants into the storage buffer of the processor and 
an instruction for using the stored divided constants, 
where the divided constants are obtained by dividing the 
constant. 

[0016] With the stated process, only divided con- 
stants exceeding the size of constant areas in object 
VLIWs are pre-stored in the constant buffer and the fol- 
lowing instructions use the divided constants in the con- 
stant buffer. As a result, a compiler suitable for VLIWs 
having operation fields for specifying short operands 
can be realized. 

[0017] Here, the compiler may further include a com- 
bination step which, when two or more divided instruc- 
tions generated from a same instruction including a con- 
stant in the division substep are arranged in a same 
VLIW in the relocation step, combines the two or more 
divided instructions into one instruction. 
[001 8) With the stated step, inconvenience situations 
can be precluded where an instruction which should re- 
main as a single instruction (an instruction which should 
not be divided) are divided into two or more instructions, 
arranged in different operation fields of a VLIW, and are 
executed, so that the execution speed is reduced. Also, 
the combination of divided constant set Instructions and 
inappropriate divided constant use instructions can be 
prevented. 

[0019] Here, in the instruction size judgement sub- 
step, when the final size has not been determined, the 
instruction size judgement may be performed using an 
assumed size forthe constant. The compiler may further 
include: a constant size determination step for linking a 
plurality of VLIW sequences and determining a final size 
of each constant; and an insertion step which, when the 
final size is greater than the assumed size, generates 
an instruction for storing into the storage buffer a divided 
constant corresponding to a difference between the final 
size and the assumed size and inserting the generated 
instruction Into a corresponding VLIW sequence. 
[0020] With th e stated steps , inconsistency du ring th e 
division and link processes due to label sizes which 
have not been determined during compiling and assem- 
bling can be avoided. Therefore, a compiler suitable for 
program development which links object modules gen- 


erated in a plurality of compile units can be realized. 
[0021] Here, in the instruction size judgement sub- 
step, when the final size has not been determined, the 
assumed size may be set to the maximum address size 

s or constant size manageable by the processor or to the 
most commonly used address size or constant size. 
[0022] With the staled process, inconstancy due to 
the assumed sizes can be avoided so that the genera- 
tion of VLIWs including no-operation codes can be sup- 

10 pressed. 

[0023] Here, the compiler may re-execute the division 
step after the constant size determination step, where 
in the instruction size judgement substep in the re-exe- 
cuted division step, the instruction size judgement is 

15 performed in consideration of the final size determined 
in the constant size determination step. 
[0024] With the stated process, during the division of 
a constant, the final label size is taken into account so 
that the instruction insertion does not need to be per- 

20 formed and executable code where the code size and 
execution time are reduced can be generated. 
[0025] Here, the compiler may re-execute the analy- 
sis step and the relocation step following the re-execut- 
ed division step. 

25 [0026] With the stated process, each constant is di- 
vided appropriately and the optimization by the parallel 
scheduling is repeated, so that executable code of high- 
er code efficiency can be generated. 
[0027] Here, the executable code of the present in- 

^ vention is a VLIW sequence for a processor which exe- 
cutes a plurality of instructions in parallel, where a VLIW 
in the VLIW sequence includes a constant to be stored 
into a storage buffer of the processor implicitly indicated 
by at least one VLIW in the VLIW sequence, and another 

35 VLIW, which follows the VLIW and is the first to refer to 
the storage buffer after the VLIW, includes an instruction 
for using the constant in the storage buffer. 
[0028] In the stated code, each constant and each in- 
struction using a constant are respectively divkjed into 

40 at least two shorter constants and instructions, are ar- 
ranged in VLIWs. and are scheduled to be reconstructed 
by the constant reconstructing processor. Therefore, ex- 
ecutable code suitable for the constant reconstructing 
VLIW processor, namely executable code of high code 

45 efficiency where the redundant areas in VLIWs are sup- 
pressed, can be provided. 

BRIEF DESCRIPTION OF THE DRAWINGS 

so [0029] These and other objects, advantages and fea- 
tures of the invention will become apparent from the fol- 
lowing description thereof taken in conjunction with the 
accompanying drawings which illustrate a specific em- 
bodiment of the Invention. In the drawings: 

55 

Fig. 1 Is a block diagram showing an example of the 
architecture of the processor 1 00 for which the com- 
piler of the present invention is used; 
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Figs. 2 A and 2B show two formats of VLIWs gener- 
ated by the compiler of the present invention; 
Figs. 3A to 3C show three formats for 12-bit opera- 
tion field of VLIWs; 

Figs. 4A to 4B show two formats for 24-bit operation s 
field of VLIWs; 

Fig. 5 is a block diagram showing the construction 
of the compiler and related input/output data; 
Fig. 6 is a flowchart showing the processing of the 
constant division unit 12 of the compiler of the io 
present invention; 

Fig. 7 is a flowchart showing the processing of the 
dependence graph generation unit 20 of the com- 
piler of the present inventbn; 

Fig. B is a flowchart showing a processing of the is 
instruction relocation unit 21 of the compiler of the 

present invention; 

Fig. 9 is a block diagram showing the detailed con- 
struction of the linker unit 17 of the compiler of the 
present invention; 

Fig. 10 is a flowchart showing the processing of the 
instructton insertion unit 23 of the linker unit 17; 
Figs. 11 A to lie show a series of input and output 
code and related data of Example 1 ; 
Fig. 12 is a final dependence graph generated by 2S 
the dependence graph generation unit 20 when the 
serial assembler code shown in Fig. 11 B is inputted 
into the parallel scheduling unit 1 3; 
Fig. 1 3 is a block diagram showing the construction 
of an ordinary compiler; so 
Fig. 14 shows a dependence graph generated by 
the dependence graph generation unit 920 of the 
ordinary compiler; 

Fig. 15 shows VLIWs generated by the instruction 
relocation unit 921 of the ordinary compiler; 3S 
Figs. 16 shows parallel assembler code generated 
by the ordinary compiler; 

Figs. 17A to 17G show a series of input and output 
code and related data of Example 2; 
Figs. ISA to 18E show a series of input and output 40 
code and related data of Example 2 which are gen- 
erated by each element of the compiler of the 
present invention when the generated location in- 
formation 40 is again input into the constant division 
unit 12; 4S 
Figs. 1 9A to 19E show a series of input and output 
code and related data of Example 3; 
Figs. 20A to 20F show a series of input and output 
code and related data of Example 4; 
Figs. 21 A and 21 B show that the function of the con- so 
stant division unit 12 of the present invention can 
be expressed from two different points of view; and 
Fig. 22 shows a simplified content of a CD-ROM re- 
cording a VLIW sequence generated by the compil- 
er of the present invention. ss 


DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0030] An embodiment of the compiler of the present 
invention is described below with reference to the fig- 
ures. 

<Hardware Requirement> 

[0031] The present compiler is a cross compiler which 
translates and links a source program written in a high- 
level language to generate an executable program for 
a VLIW processor described later. The present compiler 
is achieved by a program whbh can be executed by a 
general computer system, namely an engineering work 
station or a personal computer. Therefore, the present 
compiler or code generated by the present compiler can 
be stored and distributed in a recording medium, such 
as a floppy disk, a CD-ROM, or a semiconductor mem- 
ory. 

[0032] It should be noted here that the "compiler" in 
this specification should not be interpreted as a narrow- 
sense compiler which generates assembler code by 
translating source code written in a high-level language, 
but should be interpreted as a broad-sense compiler 
which additionally has a function for generating ma- 
chine-language object code by translating the assem- 
bler code and a function for linking the object code. 

<Target processor> 

[0033] Prior to the description of the present compiler, 
functions required by the target processor is described 
first. 

(Architecture) 

[0034] The target processor is a constant reconstruct- 
ing VLIW processor described above. 
[0035] Fig. 1 is a block diagram showing an example 
architecture of the target processor. 
[0036] The target processor 1 00 is a processor which 
executes fixed 32-bit VLIWs. The processor 100 in- 
cludes the instruction fetch circuit 101, the instruction 
register 102. three instruction decoders 103-105. the 
constant buffer 107 which is a specialized shift register 
for accumulating constants up to 32 bits to reconstruct 
an original constant, the register group 108 including 
sixteen 32-blt registers R0-R15, and two operation units 
109 and 110 which execute their operations in parallel. 
[0037] In executing a program in which divided con- 
stants extracted from each original instruction are ar- 
ranged in different VLIWs. the VLIW processor 100 has 
a function for accumulating the divided constants by 
shifting them in the constant buffer 107 to reconstruct 
the original constant. After the reconstruction, the VLIW 
processor 100 uses the reconstructed constant as a 
branch destination or an operand. Immediately after a 
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value stored in the constant buffer 107 is used (referred 
to), the stored value is definitely cleared, that is, re- 
placed by Os to prepare for the next accumulation. 
[0038] A compiler for the VLIW processor 100 needs 
to ensure that, during the execution of a program, all 
divided constants are definitely stored in the constant 
buffer 1 07 in appropriate order to reconstruct the original 
constant and that the reconstructed constant is definite- 
ly used by an intended Instruction. That is, when dividing 
' a constant in each instruction and arranging the divided 
constants in a plurality of VLIWs. the compiler needs to 
perform scheduling in view of various considerations, 
such as an execution order of the present Instruction 
and other related instructions, to generate VLIWs, so 
that the original constant is definitely reconstructed from 
the divided constants and the reconstructed constant Is 
used by the original Instruction. 

[0039] in this specification, a 'VLIW" refers to code 
specifying a group of operations to be executed in one 
cycle in parallel by the VLIW processor 100, while an 
"instruction" (except for the "VLIW") refers to the code 
specifying a single operation. Also, a "constant" refers 
to a value explicitly specified in an instruction (an imme- 
diate) and to a label which is determined during a link- 
age. 

[0040] Figs. 2 A and 2B show two formats of VLIWs 
executed by the VLIW processor 1 00 (a three-operation 
format and a two-operation format). 
[0041] Each VLIW is composed of three fields (the 
8-blt first field 51, the 12-bit second field 52, and the 
12-blt third field 53). 

[0042] In the three-operation format shown in Fig. 2A, 
the first field 51 gives format information specifying a 
VLIW format and the first operation, the second field 52 
gives the second operation, and the third field 53 gives 
the third operation. 

[0043] In the two-operation format shown in Fig. 2B, 
the first field 51 gives the format information and the first 
operation and the 24-bit area composed of the second 
field 52 and the third field 53 gives the second operation. 
[0044] The format Information specifies one of the two 
formats and specifies one or more fields Including only 
constants to be accumulated in the constant buffer 107 
(the second field 52. the third field 53, or both the second 
and third fields 52 and 53). 

[0045] The first operation is limited to a branch in- 
struction. A branch label (a branch destination address) 
for the branch instruction is specified by the constant 
buffer 107. the second field 52, the third field 53. or a 
combination of such. 

[0046] The second and third operations are standard 
transfer/arithmetic logic instructions that do not include 
branch instructions. Note that Instructions requiring 
memory access, such as load or store instructions, are 
limited to either the second operation or the third oper- 
ation. These standard transfer/arithmetic logic instruc- 
tions are either 12 bits long or 24 bits long. Although 
basically expressed by 12 bits, the transfer/arithmetic 


logic instructions are expressed by 24 bits when long 
operands are included in the instructions. 
[0047] Figs. 3A to 3C show three formats for a 1 2-bit 
operation field. Fig. 3A shows a format for an inter-r eg- 
s ister operation; Fig. 3B a fomnat for an operation using 
a register and a 4-bit constant; Fig. 3C a format for only 
specifying a 12-bit divided constant to be stored in the 
constant buffer 107. 

[0048] Figs. 4A to 4B shows two f omiats for 24-bit op- 
10 eration field. Fig. 4A shows a format for an operation 
using a register and a 16-bit constant; Fig. 4B a format 
for only specifying a 24-brt divided constant to be stored 
in the constant buffer 107. 


[0049] The main instructions in the instruction set cf 
the VLIW processor 100 are described below. 


[0050] This instruction is a transfer instruction for set- 
ting 16-bit constant "0x1234" (where Ox represents hex- 
adecimal) in the register RO. This instruction is the same 
25 as for a standard processor. 

[0051] This instruction Includes a 16-bit constant. Ac- 
cordingly, this instruction is used for a 24-bit operation. 
That means, anothertransf er/arithmetic logic instruction 
cannot be arranged in the VLIW that includes this in- 
30 struction. 

{(Example 2) sfst 0x1234:12u} 

[0052] This instruction is a transfer instruction for set- 
55 ting the upper 12 bits •0x1234: 12u" of the 16-bit con- 
stant "0x1234" in the constant buffer 107 by shifting the 
content in the constant buffer 107, and is a divided con- 
stant set instruction. 

[0053] Here, the "divided constant set instruction" is 
40 an instruction for accumulating divided constants in an 
implicitly determined storage area (the constant buffer 
107) and is an instruction unique to the VLIW processor 
1 00. The divided constant can be all digits or partial dig- 
Its of a branch label used for a branch operation or is 
45 partial digits of a constant used for a transfer/arithmetic 
logic instruction. 

[0054] In the final executable code for this instruction, 
a field including this instruction includes only the 12-bit 
divided constant, making the instruction a 12-bit opera- 
te tion. The first field 51 includes format information spec- 
ifying the instruction. Accordingly, the VLIW Including 
this instruction can include only one more 12-bit opera- 
tion. 


[005S] This instruction is a divided constant use in- 
struction. In more detail, it is a transfer instruction for 


so {(Example 1 ) mov 0x1 234 , RO} 


ss {(Example 3) mov 0x1 234:4L , RO) 


IS (Instruction Set) 
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setting a 16-bit constant in the register RO by combining 
the divided constant stored in the constant buffer 107 
as the upper 12 bits and the lower 4 bits of the constant 
"0x1 234' specified by this instruction (0x1 234:4L) as the 
lower 4 bits. Here, a 'divided constant use instruction" 
is an instruction for using divided constants stored in an 
Implicitly determined storage area (the constant buffer 
1 07) and is an instruction unique to the VLIW processor 
100. 

[0056] The instruction of Example 3 is a 12-bit oper- 
ation. Accordingly, a VLIW including this instruction can 
include only one more 12-bit operation. 
[0057] It should be noted here that the execution re- 
sult of the instruction of Example 1 is the same as that 
obtained by sequentially executing the instructions of 
Example 2 and Example 3. Accordingly, the compiler 
can generate two 12-bit Instructions, namely the divided 
constant set instruction of Example 2 and the divided 
constant use instruction of Example 3, instead of the 
24-bit instruction of Example 1 . By doing so. when a re- 
dundant area is present in an instruction, the redundant 
area can be filled with the divided constant set instruc- 
tion, thereby improving the code efficiency. 

<Constructlon of Compiler> 

[0058] Fig. 5 is a block diagram showing the construc- 
tion of the present compiler and related input/output da- 
ta. 

[0059] The present compiler can be roughly divided 
into three groups. The first group generates the serial 
assembler code 42 from the source code 41 written in 
a high-level language (the compiler upstream part 10 
and the assembler code generation unit 11 ). The second 
group generates the parallel assembler code 43 and the 
object code 44a-44b by subjecting the serial assembler 
code 42 to the parallel scheduling which is unique to the 
VLIW processor 100 (the constant division unit 12, the 
parallel scheduling unit 13, the constant combination 
unit 14, the code output unit 15, and the parallel assem- 
bler unit 1 6). The third group generates the final execut- 
able code 46 by linking a plurality of relocatable object 
code 44a and 44b (the linker unit 17). 
[0060] The relocatbn information 45a-45b and loca- 
tion information 40 are related to labels and are input 
into or output from the linker unit 17. The relocation in- 
formation 45a-45b and location information 40 are used 
to determine final label addresses and are also input into 
the constant division unit 12 for use when generating 
optimal code. The input/output data 40-45 and other in- 
termediate language data are stored on a hard disk of 
the computer system described above as files or are 
stored in a memory as temporary data. 

(Compiler Upstream Part 10) 

[0061 ] The compiler upstream part 1 0 reads high-lev- 
el language source code 41 saved in a file format, per- 


forms syntactic analysis and semantic analysis on the 
source code 41 , and generates intemal format code. 
Furthermore, as necessary, the internal fornnat code are 
optimized so that the size of the finally generated exe- 
s cutable code and the execution time are reduced. The 
processing of the compiler upstream part 10 is the same 
as that of the compiler upstream part of an ordinary com- 
piler (a compiler for an ordinary processor, not for a con- 
stant reconstructing VLIW processor). 

10 

(Assembler Code Generation Unit 11) 

[0062] The assembler code generation unit 1 1 gener- 
ates the serial assembler code 42 from the intemal for- 

IS mat code which was generated and optimized by the 
compiler upstream part 10. Here, the 'serial assembler 
code" is serially arranged assembler instructions for op- 
erations and is assembler code for an ordinary proces- 
sor (a processor including one operation unit). The 

20 processing of the assembler code generation unit 11 is 
the same as that of the assembler code generation unit 
of an ordinary compiler. 

(Constant Division Unit 12) 

25 

[0063] The constant division unit 1 2 reads the assem- 
bler code 42 generated by the assembler code genera- 
tion unit 11 and divides all long constant use instructions 
included in the assembler code 42 into divided constant 
30 set instructions and divided constant use instructions. 
That is, long constant use instructions are replaced with 
two types of instructions (divided constant set instruc- 
tions and divided constant use instructions). With the 
two types of instructions, the same process as that of a 
35 long constant use instruction is performed. During this 
replacement process, depending on the length of the 
long constant included in a long constant use instruc- 
tion, the long constant use instruction may be replaced 
with two or more divided constant set instructions and 
40 a divided constant use instruction. 

[0064] Here, a "long constant" is a constant which is 
too long to be written within a unit operation field in a 
VLIW. More specifically, (1) when used by a branch in- 
struction, a long constant is a constant which cannot be 
45 written within the first operation field (a branch label ex- 
pressed by one or more bits), and (2) when used by a 
transfer/arithmetic logic instruction, a long constant is a 
constant which cannot be written within a 1 2-bit opera- 
tion field shown in Fig. 3B (a constant expressed by 5 
so or more bits). Also, a 'long constant use instruction' is 
an instruction using a long constant. 
[0065] On the other hand, a constant which can be 
written within a unit operation field in a VLIW, which is 
to say a constant used by a transfer/arithmetic logic in- 
ss struction and is expressed by 4 or less bits, is called a 
"short constant". An instruction using a short constant 
is called a "short constant use instruction". Note that the 
divided constant use instruction includes a short con- 
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stant (a constant of 4 or less bits) and therefore is a short 
constant use instruction. 

[0066] The following is a detailed description of the 

processing of the constant division unit 12. 

[0067] Fig. 6 is a flowchart showing the processing of 

the constant division unit 12. 

[0068] The constant dMsion unit 1 2 performs the fol- 
lowing process (steps S2-S4) tor each instruction in- 
cluded in the assembler code 42 (steps SI -S5). 
[0069] Firstly, the constant division unit 12 determines 
the size of a label (the number of bits necessary to ex- 
press an address indicated by the label) included in an 
instruction to be processed (hereinafter simply referred 
to as a 'target instruction") (step S2). 
[0070] More specifically, when the size of a label can 
be determined because the label is a local label which 
is present in the same compile unit of the source code 
41 or when the size is explicitly indicated by any infor- 
mation such as the location information 40, the size is 
added to the label as size information as it is. When, 
however, the size of a label cannot be determined be- 
cause the label is an external label which is present in 
another compile unit of the source code 41 , a temporary 
size is added to the label as label Information. Note that 
in this embodiment, the temporary size is predetermined 
to be 16 bits which, according to statistical analysis, is 
the most common address size. 
[0071] The constant division unit 12 then determines 
whether the target instruction is a long constant use in- 
struction (step S3). 

[0072] When the target instruction is judged to be a 
long constant use instruction, the instructbn is divided 
into one or more divided constant set instructions and a 
divided constant use instruction (step S4). 
[0073] More specifically, when the long constant use 
instruction is a branch instruction, the long constant (an 
address indicated by a branch label) is divided into 
12-bit parts in order from the least significant bit. The 
constant division unit 12 generates one or more divided 
constant set instructions, which set obtained divided 
constants in the constant buffer 107 sequentially from 
the most significant bit, and a divided constant use in- 
struction (an instruction equivalent to the operation code 
of a branch instruction). The target instruction is re- 
placed with the generated one or more divided constant 
set instructions and a divided constant use instruction. 
When the long constant in the instruction has 19 bits, 
for instance, the long constant is given leading zeros to 
be 24 bits (a multiple of 12 bits) and is divided into the 
upper 12 bits and the lower 12 bits. Three instructions 
in total, namely a divided constant set instruction for the 
upper 12-bit divided constant, a divided constant set in- 
struction for the lower 12-bit divided constant, and a di- 
vided constant use Instruction, are generated in this or- 
der. The target instruction is replaced with these gener- 
ated instructions. 

[0074] On the other hand, when the long constant use 
instruction is a transfer/arithmetic logic instruction, the 


constant division unit 12 first renrKives the equivalent of 
a short constant (the lower 4 bits) and divides the re- 
maining long constant into 12-bit units starting from its 
least significant bit. The constant division unit 12 gen- 
s erates one or more divided constant set instructions, 
which set obtained divided constants in the constant 
buffer 1 07 sequentially from the most significant bit, and 
a divided constant use instruction (an instruction includ- 
ing the operation code of the transfer/arithmetic logic in- 
to struction and an operand indicating the short constant). 
The target instruction is replaced with the generated in- 
structions. When the long constant in the instruction has 
1 9 bits, for instance, the long constant is given leading 
zeros to be 28 bits (12 bits x n -i- 4 bits) and is divided 
IS into the upper 1 2 bits, the middle 1 2 bits, and the lower 
4 bits. Three instructions in total, namely a divided con- 
stant set instruction for the upper 12-bit divided con- 
stant, a divided constant set instruction for the middle 
12-brt divided constant, and a divided constant use in- 
20 struction including the lower 4-bit divided constant, are 
generated in this order. The target instruction is replaced 
with these generated instructions. 
[0075] ft should be noted here that two different meth- 
ods of dividing a long constant are used depending on 
2S vvhether a long constant use instruction is a branch in- 
struction or a transfer/arithmetic logic instruction. This 
is because a divided constant (a branch label) cannot 
be inserted into the first field 51 where a branch instruc- 
tion is inserted, while a divided constant (a short con- 
30 stant) can be inserted into the second field 52 or the third 
field 53 where a transfer/arithmetic logic instruction is 
inserted. 

(Parallel Scheduling Unit 13) 

35 

[0076] The parallel scheduling unit 1 3 receives serial 
assembler code from which long constant use instruc- 
tions have been eliminated by the constant division unit 
12. The parallel scheduling unit 13 detects the parallel- 
40 ism of the serial assembler code at the assembler in- 
struction level and generates parallel assembler code 
packed into VLI Ws corresponding tothe three-operation 
format shown in Fig. 2A or the two-operation format 
shown in Fig. 2B. Here, "parallel assembler code" is as- 
45 sembler code for a VLIW processor, where a sequence 
of parallel assembler instructions is used to specify a 
plurality of operations that can be executed in parallel. 
[0077] The parallel scheduling unit 1 3 includes the de- 
pendence graph generation unit 20 and the instruction 
so relocation unit 21 . 

(Dependence Graph Generation Unit 20) 

[0078] The dependence graph generation unit 20 
ss generates a dependence graph for the assembler code 
output from the constant division unit 12. Here, the "de- 
pendence graph" is a directional graph expressing ex- 
ecution order relations between assembler instructions 
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with links (also called arrows or edges) whose nodes 
are the instructions and regulates the execution order 
of instructions in the assembler code. 
[0079] The processing of the dependence graph gen- 
eration unit 20 is described in detail below. 
[0080] Fig. 7 is a flowchart showing the processing of 
the dependence graph generation unit 20. 
[0081] The dependence graph generation unit 20 re- 
peats the processing described below (steps SI 2-S27) 
for each instruction in the serial assembler code from 
which long constant use instructions are eliminated by 
the constant division unit 12 (steps S11-S29). 
[0082] After generating the node of a target instruction 
(step SI 2), the dependence graph generation unit 20 
repeats the following three processes, (1) the genera- 
tion of a dependence graph based on the exclusive con- 
trol over the register group 108 (steps S13-S18), (2) the 
generation of a dependence graph based on the exclu- 
sive control over the memory (steps S19-S24), and (3) 
the generation of a dependence graph based on the ex- 
clusive control over the constant buffer 107 (steps 
S25-S28). These processes are described in more de- 
tail below. 

[0083] Firstly, the dependence graph generation unit 
20 generates a node corresponding to a target instruc- 
tion (step SI 2). More specifically, the dependence 
graph generation unit 20 generates information relating 
the target instruction to the node. 
[0084] The dependence graph generation unit 20 
judges whether the target instruction refers to a register 
(step SI 3). Here, 'referring to a register" indicates that 
the value of the register is read. 
[0085] When a register is referred to. the previous reg- 
ister definition instruction (a previous instruction which 
defines the register) is specified and a link from the 
specified instruction to the target instruction is estab- 
lished (step SI 4). More specifically, information indicat- 
ing a link from the node corresponding to the indicated 
instruction to the node corresponding to the target in- 
struction is generated. 

[0086] In this specification, a 'register definition" 
means that a value in a register is discarded and a new 
value is set in the register. Also, the 'previous instruc- 
tion" means the latest instruction before a target instruc- 
tion. 

[0087] When a single instruction refers to a plurality 
of registers, the dependence graph generation unit 20 
repeats the steps SI 3 and S14 for each register. This 
repetition may also apply to the following steps. 
[0088] Next, the dependence graph generation unit 
20 judges whether the target instruction defines a reg- 
ister (step SI 5). 

[0089] When the target instruction defines a register, 
the previous register control instruction (a previous in- 
struction which controls the register) is specified and it 
is judged whether the specified instruction is a register 
definition instruction (step SI 6), Here, "register control" 
means the definition and reference of a register. 


[0090] When the judgement result is that the specified 
instruction is a register definition instruction, a link from 
the register definition instruction to the target instruction 
is established (step SI 7). 

s [0091 ] On the other hand, when the specified instruc- 
tion is a register reference instruction, not a register def- 
inition instruction, the previous register definition in- 
struction is specified and links are established to the tar- 
get instruction from each register reference instruction 

10 (instructions for referring to the register) located be- 
tween the previous register definition instruction and the 
target instruction (step SI 8). 

[0092] The processing related to register references 
and register definitions described above (steps 
IS S13-S18) is also performed for the memory (steps 
S19-S24). 

[0093] Following this process, the dependence graph 
generation unit 20 judges whether the target instruction 
is a divided constant set instruction (step S25). 

20 [0094] When the target Instruction is a divided con- 
stant set instruction, a link from the previous constant 
buffer control instruction to the target instruction is es- 
tablished (step S26). Here, a 'constant buffer control in- 
struction' is an instruction for controlling (defining and 

^5 referring to) the constant buffer 107, namely a divided 
constant set instructbn and a divided constant use in- 
struction. 

[0095] Lastly, the dependence graph generation unit 
20 judges whether the target instruction is a divided con- 

30 stant use instruction (step S27). 

[0096] When the target instruction is a divided con- 
stant use instruction, a link from the previous constant 
buffer control instruction to the target instruction is es- 
tablished (step S28). 

35 [0097] Note that there are differences between the 
process for generating a dependence graph concerning 
registers (steps S13-S18) and the process for generat- 
ing a dependence graph concerning the constant buffer 
107 (steps S25-S28). This is because each divided con- 

^ stant set instruction and divided constant use instruc- 
tions that accesses the constant buffer 1 07 is an instruc- 
tion for referring to and also defining the constant buffer 
107. That is. the constant buffer 107 includes a shift reg- 
ister, so that a divided constant set instructk>n is a shift 

45 & set instruction (a reference and definition instruction). 
Because the content of the constant buffer 107 is 
cleared immediately after the content is referred to, a 
divided constant use instruction is a reference and def- 
inition instruction. 

50 

(Instruction Relocation Unit 21) 

[0098] In compliance with the execution order indicat- 
ed by the dependence graph generated by the depend- 
55 ence graph generation unit 20, the instruction relocation 
unit 21 relocates instructions in the serial assembler 
code output from the constant division unit 12 by pack- 
ing the instructions in VLiW units of the target processor 
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100. When doing so, the instruction relocation unit 21 
relocates instructions so that the largest possible 
number of instructions are executed in parallel, thereby 
reducing the execution time. 

[0099] The processing of the instruction relocation s 
unit 21 is described in detail below. 
[01 00] Fig. 8 is a flowchart showing the processing of 
the instruction relocation unit 21 . 
[0101] The instruction relocation unit 21 repeats the 
following process (steps S42-S50) until all instructions 
in the serial assembler code are relocated (all instruc- 
tions are packed in VLIW units and are output from the 
parallel scheduling unit 13) (steps S41-S51). 
[01 02] Firstly, the instruction relocation unit 21 checks 
the dependence graph to classify all instructions that 
can be output at the present time into an outputable in- 
struction group (step S42). Here, an "outputable instruc- 
tion' is an instruction which does not depend on a pre- 
vious instruction and so can be executed (outputted) in- 
dependently. The examples of outputable instructions 
are (1 ) a target instruction to which there is no link in the 
dependence graph and (2) a target instruction whose 
link source node corresponds to instructions having 
been output or to divided constant set instructk>ns in the 
dependence graph. 

[01 03] Also, the "outputable instruction group* is com- 
posed of all instructions which can be output at the 
present time. As described above, the outputable in- 
struction group includes target instructions whose link 
source is a divided constant set instruction. This is be- 
cause even if a VLIW includes a divided constant set 
instructbn and a divided constant use Instruction, these 
instructions can be replaced with a single instruction by 
the constant combination unit 14 as described later, so 
that these instructions can be executed without causing 
any problems. 

[0104] After an outputable instruction group is gener- 
ated (step S42), a process for selecting and deleting one 
instructk>n from the group (steps S45-S48) is repeated 
until all instructions have been selected and deleted 
from the group (steps S43-S49). 
[0105] Note that when one VLIW is generated, the 
process exits from the loop (steps S43-S49), generates 
another outputable instruction group (step S42), and re- 
peats the same process (steps S45-S48) (steps 
S43-S49). This is because by the time instructions com- 
posing the generated VLIW are deleted from the out- 
putable instruction group, new outputable instructions 
may have been generated. 

[0106] First, the instruction relocation unit 21 judges 
whether a VLIW can be made from instructions in an 
output schedule instruction group (whether any more in- 
structions can be inserted into the VLIW) (step S44). 
[0107] Here, "output schedule Instructions" are in- 
structions which can be included in a generated single 
VLIW and are executable in parallel, while the "output 
schedule instruction group" temporarily holds instruc- 
tions to accumulate the maximum number of output 


schedule instructions (the maximum number of output 
schedule instructk^ns which can be arranged in a VLIW). 
That is, only instructions shifted from the outputable in- 
structbn group to the output schedule instruction group 
are output from the parallel scheduling unit 13 as in- 
structions which compose a generated VLIW. 
[0108] When judging that a VLIW cannot be generat- 
ed in step S44, the instruction relocation unit 21 selects 
an instruction from the outputable instruction group that 
will result in the execution time and the code size being 
reduced (step S45). More specifically, the instructbn re- 
kx:ation unit 21 calculates estimates for the total number 
of VLIWs generated from a bask: block by referring to 
the dependence graph and selects the instruction that 
results in the lowest estimate. 

[0109] After this, the instruction relocation unit 21 
judges whether the selected instruction (a target instruc- 
tion) can be included in the output schedule instruction 
group (step S46). Here, if one or more instructions have 
been included in the output schedule instruction group 
by this time, the instruction rekx^ation unit 21 judges 
whether the included instructions and the target instruc- 
tion can compose a VLIW (whether a VLIW can be out- 
put) (step S46). 

[Oil 0] For instance, when there is a 1 2-bit instruction 
in the output schedule instruction group and the instruc- 
tion selected in step S45 is 24 bits long, these instruc- 
tions cannot compose a VLIW. Therefore, the instruction 
relocation unit 21 judges that these instructions cannot 
be output. When a divided constant set instruction of the 
link source of the current node has not been output and 
is not present in the output schedule instruction group, 
the instruction relocation unit 21 Judges that instructions 
cannot be output. This prevents the generation of an er- 
roneous code, where a divided constant use instruction 
is output without divided constant set instructions. 
[0111] When the instruction relocation unit 21 judges 
that an instruction included in the output schedule in- 
struction group and a target instruction can compose a 
VLIW in step S46, the target instruction is transferred 
from the outputable instructk)n group to the output 
schedule instruction group (steps S47 and S48). 
[0112] On the other hand, when the instructton rek>- 
cation unit 21 judges that an instruction included in the 
output schedule instruction group and a target instruc- 
tion cannot compose a VLIW in step S46. the target in- 
struction cannot be output at this time and so is elimi- 
nated from the outputable instruction group (step S48). 
[0113] When the instruction relocation unit 21 judges 
that a VLIW can be made from instructions in the output 
schedule instruction group in step S44, the instructions 
are eliminated from the output schedule instruction 
group and are output as a VLIW (step S50). Note that 
when the process proceeds from step S49 to step S50, 
there are cases where all operation fields of a VLIW can- 
not be filled with instructions remaining in the output 
schedule instruction group. In this case, a VLIW whose 
blank operation fields are filled with no-operation in- 
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structions (nop) is output. 

[01 14] In this manner, the serial assembler code input 
into the parallel scheduling unit 1 3 is packed into VLIWs 
to generate parallel assembler code which is then out- 
put. 5 

(Constant Combination Unit 14) 

[01 1 5] When one or more divided constant set instruc- 
tions and a divided constant use instruction generated io 
from the same long constant use instruction by the con- 
stant division unit 1 2 are packed into the same VLI W (in 
the same cycle) by the parallel scheduling unit 1 3, the 
constant combination unit 14 replaces these instruc- 
tions with a long constant use instruction which rt obtains is 
by combining these instructions. Similarly, when a plu- 
rality of divided constant set instructions generated from 
the same long constant use instruction are packed into 
the same VLIW (in the same cycle), the constant com- 
bination unit 1 4 replaces these instructions with a divid- 20 
ed constant set instruction which it obtains by combining 
these instructions. This corresponds to a case where the 
constant division unit 12 need not have divided a long 
constant (need not have arranged divided constants in 
a plurality of VLIWs). 2S 

(Code Output Unit 15) 

[0116] The code output unit 15 converts internal-for- 
mat assembler code that has been combined (replaced) 30 
by the constant combination unit 1 4 to a text-format as- 
sembler code, and outputs the converted code as files 
(the parallel assembler code 43). 

(Parallel Assembler Unit 16) 3S 

[0117] The parallel assembler unit 16 converts the 
parallel assembler code 43 output from the code output 
unit 15 into a machine language dedicated to the VLIW 
processor 100 for which the present compiler is used, ^0 
and generates the object code 44a-44b and the reloca- 
tion information 45a-45b. During this process, the for- 
mat information to be located in the first field 51 of a 
VLIW is determined. In the case of a VLIW including one 
or more divided constant set instructions, for instance, 4S 
the parallel assembler unit 16 generates machine code 
for the fields having only a divided constant and the for- 
mat information indicating the fields. 
[0118] Each of the relocation information 45a-45b is 
composed of information indicating the name of a label so 
for each object code 44a, the address of an instruction 
using the label, and the size of the label. This label size 
is the size determined by the constant division unit 12 
(the label size determined in step S2 shown in Fig. 6). 
and is a temporary value (1 6 bits in the above example) ss 
in the case of an external label. 


(Linker Unit 17) 

[0119] The linker unit 17 links the plurality of relocat- 
able object code 44a-44b generated in different compile 
units, determines undetermined labels included in the 
object code, and generates the executable code 46 and 
the relocation information 40 for the VLIW processor 
100. 

[0120] Fig. 9 is a block diagram showing the detailed 
construction of the linker unit 17. 
[0121] The linker unit 17 includes the label address 
calculatbn unit 22. the instruction insertion unit 23, and 
the output unit 24. 

[0122] The label address cateulation unit 22 calcu- 
lates an address of each label after the plurality of relo- 
catable object code 44a-44b input into the linker unit is 
linked. By doing so, the size of each label is also deter- 
mined. This process is the same as that by a label ad- 
dress calculation unit of an ordinary compiler. 
[01 23] When the size of a label calculated by the label 
address calculation unit 22 is greater than the size indi- 
cated by the relocation information 45a-45b. which is the 
size determined by the constant divisbn unit 1 2, the in- 
struction insertion unit 23 inserts the required divided 
constant set instruction to cope with the situation. 
[0124] Fig. 10 is a flowchart showing the processing 
of the instruction insertion unit 23. 
[0125] The instruction insertion unit 23 sequentially 
fetches each label out of the object code 44a-44b input 
into the linker unit 17 and repeats the following process 
(steps S62-S64) for each of the fetched labels (steps 
S61-S65). 

[01 26] Firstly, relocation information of a fetched label 
(a target label) is read from the relocation infonmation 
45a-45b input into the linker unit 17 (step S62). 
[01 27] Then, the linker unit 1 7 judges whether the size 
calculated by the label address calculation unit 22 is 
greater than that of the target label indicated by the read 
relocation information (step S63). 
[0128] If so, one or more divided constant set instruc- 
tions are generated to store divided constants corre- 
sponding to the difference between these sizes. A new 
VLIW including the generated divided constant set in- 
structions and a no-operation instruction is inserted im- 
mediately before the VLIW Including the instruction that 
uses the target label (step 864). 

[0129] By doing so, even If the size of a temporary 
label determined by the constant division unit 12 is 
smaller than the actually required size, the difference 
between these sizes is recognized and a necessary 
treatment is given. 

[0130] The output unit 24 generates the locatk>n in- 
formation 40 indicating the size of each label determined 
by the label address calculation unit 22 and a list of in- 
structions that refer to the labels, and outputs the loca- 
tion information 40 with the executable code 46 obtained 
after the processing of the instruction insertion unit 23. 
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<Operation of the Compiler> 

[01 31] The folbwing is a description of the operation 
of characteristic elements of the present compiler for 
specific instructions. s 

<Example 1> 

[0132] The following description concerns an opera- 
tion of the constant division unit 12 and the parallel 
schedule unit 1 3 where the serial assembler code 42 
shown in Fig. 11 A is generated by the assembler code 
generation unit 11 and is input into the constant division 
unit 12. 

[0133] The assembler code 401 and 402 shown in is 

Fig. 11 A are described below. 

{(Instruction 401) add R1. R2} 

[0134] The value in the register R1 is added to the 20 
value in the register R2 and the result is stored in the 
register R2. 

{(Instruction 402) Ld (LabeL) . R3} 

2S 

[0135] The value stored in the area with the memory 
address indicated by the label "LabeL" is loaded into the 
register R3. 

(Constant Division Unit 12) 30 

[0136] The operation of the constant division unit 12 
when the serial assembler code 42 shown in Fig. 1 1 A is 
input is described below with reference to the flowchart 
shown in Fig. 6. -35 
[0137] The constant division unit 12 repeats the 
processing for dividing long constants (steps SI -S5) for 
each of three instructions shown in Fig. 11 A. However, 
in this example, the instructions 400 and 401 include 
neither labels nor long constants and are therefore not 40 
processed. 

[01 38] The constant division unit 1 2 cannot determine 
the size of the label "LabeL" in the instruction 402 and 
so assumes the size is 16 bits (step S2). 
[01 39] Accordingly, the constant division unit 1 2 judg- 45 
es that the instruction 402 is a long constant use instruc- 
tion (step S3) and replaces this long constant use in- 
struction 402 with one or more divided constant set in- 
structions and a divided constant use instruction (step 
84). so 
[0140] Fig. 11 B shows code generated by the con- 
stant division unit 12 when the serial assembler code 42 
shown in Fig. 11 A is input. 

[0141] As shown in this figure, the long constant use 
instruction 402 in Fig. 11 A is replaced with the divided ss 
constant set instruction 405 and the divided constant 
use instruction 406. 


(Dependence Graph Generation Unit 20) 

[0142] The following description is based on the as- 
sumption that the serial assembler code shown in Fig. 
11 B Is input into the parallel scheduling unit 1 3. The op- 
eration of the dependence graph generation unit 20 in 
this case is described below with reference to the flow- 
chart shown in Fig. 7. 

[0143] The dependence graph generation unit 20 re- 
peats the same process for each of three instructions 
shown in Fig. 11B (steps S11-S29). 
[01 44] Fig. 1 2 shows the dependence graph 600 gen- 
erated by the dependence graph generation unit 20 in 
the case where the serial assembler code shown in Fig. 
11B is input into the parallel scheduling unit 13. 

{(Instruction 403) mov RO . R1} 

[0145] The dependence graph generation unit 20 
generates the node 601 corresponding to this instruc- 
tion 403 (step 812). 

[0146] After this, because this instruction 403 refers 
to the register RO, a link from a previous instruction de- 
fining the register RO should be established (steps SI 3 
and SI 4). However, there are no preceding instructions, 
so that this link cannot be established. 
[0147] Similarly, because this instruction 403 defines 
the register R1, a previous instruction controlling the 
register R1 should be specified (steps 815 and SI 6). 
However, there are no preceding instructions, so that a 
link cannot be established. 

[01 48] It should be noted here that this example uses 
only one baste bkx:k (a process routine having one en- 
trance and one exit) for ease of explanation. However, 
when a dependence graph is generated for a program 
including a plurality of basic blocks, the plurality of basic 
blocks can be processed by using virtual nodes indicat- 
ing preceding basic blocks and following basic blocks. 

{(Instruction 404) add R1 , R2) 

[01 49] The instruction 404 is to be processed next and 
so the dependence graph generatbn unit 20 generates 
node 603 corresponding to this instruction 404 (step 

812). 

[0150] Because the instruction 404 refers to the reg- 
ister Rl , the dependence graph generation unit 20 spec- 
ifies the previous instruction 403 defining the register Rl 
and establishes the link 602 from the instruction 403 to 
the instruction 404 (steps 813 and 814). 
[01 51 ] Because the instruction 404 defines the regis- 
ter R2, the previous instruction controlling the register 
R2 should be specified (steps 815 and 816). However, 
there is no such preceding instruction, so that a link is 
not established. 
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{(Instruction 405) stst LabeL:12u} 

[01 52] The instruction 405 Is to be processed next and 
so the dependence graph generation unit 20 generates 
the node 604 corresponding to this instruction 405 (step 
S12). 

[01 53] The instruction 405 does not control a register 
or memory and therefore is not subjected to the link 
process in steps SI 3-S24. 

[0154] The instruction 405 is a divided constant set 
instruction, so that the dependence graph generation 
unit 20 attempts to specify a previous constant buffer 
control instruction (steps S25 and S26). However, there 
is no such preceding instruction, and therefore a link is 
not established. 

{(Instruction 406) Ld (LabeL:4L) , R3} 

[0155] Finally, the instruction 406 is to be processed 
and the dependence graph generation unit 20 gener- 
ates the node 606 corresponding to this instruction 406 
(stepSl2). 

[01 56] Because the instruction 406 defines the regis- 
ter R3. the dependence graph generation unit 20 at- 
tempts to specify a previous instruction controlling the 
register R3 (steps SI 5 and SI 6). However, there is no 
such preceding instruction, and therefore a link is not 
established. 

[0157] The instruction 406 is a divided constant use 
instruction so that the dependence graph generation 
unit 2 0 specifies the previous constant buffer control 
instruction 405 and establishes the link 605 from the in- 
struction 405 to the instruction 406 (steps 827 and 828). 
[0158] In this manner, the link 602 from the instruction 
403 to the instructran 404 and the link 605 from the in- 
struction 405 to the instruction 406 are established as 
shown in Fig. 12. 

(Instructbn Relocation unit 21) 

[01 59] In compliance with the execution order indicat- 
ed by the dependence graph shown in Fig. 12. the in- 
struction relocation unit 21 relocates the serial assem- 
bier code shown in Fig. 11 B in parallel. The following is 
a description of the operation of the instaiction reloca- 
tion unit 21 in this case, with reference to the flowchart 
shown In Fig. 8. 

[0160] Until all of the four Instructions 403-406 shown 
in Fig. 11 B are output, the instruction relocation unit 21 
repeats the scheduling cycle (steps 841 >S51 ) including 
a process for generating the outputable instruction 
group (step 842) and a process for consuming one in- 
struction in the generated outputable instruction group 
at a time (step S43-S50). 

(First Scheduling Cycle) 

[0161] In the first scheduling cycle, the instruction re- 


location unit 21 generates a group composed of the 
three instructions 403, 405, and 406 as the outputable 
instruction group (step 842). This is because the instruc- 
tions 403 and 405 are instructions without links from oth- 
s er nodes to their nodes, and the instruction 406 corre- 
sponds to a node whose link source node corresponds 
to a divided constant set instruction. 
[0162] There is no instruction in the output schedule 
instruction group, so that the instruction relocation unit 
10 21 judges that a VLIW cannot be made from instructions 
in the output schedule instruction group in step 844 and 
the process proceeds to the first consumption cycle by 
selecting an optimal instruction (step 845). In this ex- 
ample, the instruction 403 is selected. 
IS [01 63] Because the output schedule instruction group 
includes no instruction, the instruction rekx^ation unit 21 
moves the selected instruction 403 Into the output 
schedule instruction group (step S47) and eliminates 
the instruction 403 from the outputable instruction group 
(step S48). 

[0164] At this time, the instructions 405 and 406 re- 
main in the outputable instruction group, so that the 
process proceeds to the second consumption cycle 
(steps 844-848). 

[0165] The output schedule instruction group does 
not include enough instructions to fill a VLIW at the 
present time, so that Instruction relocation unit 21 judges 
that a VLIW cannot be made from instructions in the out- 
put schedule instruction group in step 844. In this ex- 
ample, the instruction relocation unit 21 selects the in- 
struction 405 as an optimal instruction (step 845). 
[0166] Both the selected instructions 405 and 403 in 
the output schedule instruction group are 12-bit instruc- 
tions and so may construct a VLIW. Therefore, the in- 
struction relocation unit 21 moves the Instruction 405 in- 
to the output schedule instructbn group (step 847) and 
eliminates the instruction 405 from the outputable in- 
struction group (step 848). 

[0167] At this time, only the instruction 406 remains 
in the outputable instruction group and the output sched- 
ule instruction group Includes the instructions 403 and 
405. As a result, the instruction relocation unit 21 judges 
that a VLIW can be made from instructions in the output 
schedule instruction group in step S44, eliminates these 
instructions 403 and 405 from the output schedule in- 
struction group, and outputs a VLIW including these in- 
structions (step 850) . 

[0168] When the instruction relocation unit 21 selects 
the instruction 406, instead of the instruction 405. as an 
optimal instruction in the second consumption cycle 
(step 845), the divided constant set instruction 405 from 
which a link is established to the node of the instruction 
406 has not been output and is not included in the output 
schedule instruction group. Therefore, the instruction 
relocation unit 21 judges that an instruction included in 
the output schedule instruction group and a target in- 
struction cannot compose a VLIW in step 846 and the 
instruction 406 is eliminated from the outputable instruc- 
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tion group. As a result, the same VLIW (the VLIW in- 
cluding the instmctions 403 and 405) is output in this 
case. 

[0169] Fig. 11C shows VLIWs (parallel assembler 
code) generated by the instruction relocation unit 21 5 
when the serial assembler code shown in Fig. 11B is 
input into the parallel scheduling unit 1 3. Note that code 
in the first field 51 of a VLIW is omitted for ease of ex- 
planation. 

[0170] The first scheduling cycle generates the VLIW 
407 shown In Fig. 11C. 

(Second Scheduling Cycle) 

[0171] The second scheduling cycle starts with the in- 
struction 406 remaining in the outputable instruction 
group. 

[0172] The instruction relocation unit 21 newly adds 
the instruction 404 to the outputable instruction group 
(step S42). This is because the instruction 403 corre- 
sponds to the node from which a link is established to 
the node of the instruction 404. As a result, an outputa- 
ble instruction group composed of the instructions 404 
and 406 is generated. 

[0173] There is no instruction in the output schedule 
instruction group so that the instruction relocation unit 
21 judges that a VLIW cannot be made from instructions 
in the output schedule instruction group in step 844 and 
the process proceeds to the first consumption cycle by 
selecting an optimal instruction (step 845). In this ex- 
ample, the instruction relocation unit 21 selects the in- 
struction 404. 

[0174] There is no instruction in the output schedule 
instruction group in this case, so that the instruction re- 
location unit 21 moves the instruction 404 Into the output 
schedule instruction group (step S47) and eliminates 
the instruction 404 from the outputable instruction group 
(step S4d). 

[0175] At this time, only the instruction 406 remains 
in the outputable instruction group. The process pro- 
ceeds to the second consumption cycle in the same 
manner as the first scheduling cycle so that the instruc- 
tion 406 is also transferred from the outputable instruc- 
tion group to the output schedule instruction group 
(steps 844-848). 

[01 76] The output schedule instruction group includes 
the instructions 404 and 406 at the present time, so that 
the instruction relocation unit 21 judges that a VLIW can 
be made from instructions in the output schedule in- 
struction group in step 844. The instructions 404 and 
406 are eliminated from the output schedule instruction 
group and are output as the second VLIW (step 850). 
That is, the second scheduling cycle generates the 
VLIW 408 shown in Fig. 11C. 

[0177] By doing so, the instruction relocation unit 21 
packs all instructions input into the parallel scheduling 
unit 1 3 in VLIWs which it then outputs (steps S41 -S51 ). 
Then the instruction relocation unit 21 terminates its 


scheduling process. 

(Comparison with Ordinary Compiler) 

[0178] Two VLIWs shown in Fig. 11 C are generated 
from the serial assembler code shown in Fig. 1 1 A by the 
processing of the constant division unit 12 and the par- 
allel scheduling unit 13. This process is compared with 
the case of an ordinary compiler to demonstrate the 
characteristics of the present compiler. 
[0179] Fig. 13 is a block diagram showing the con- 
structbn of the ordinary compiler. 
[0180] While the basic functions of the ordinary com- 
piler are the same as those of the compiler of the em- 
bodiment, the ordinary compiler does not have the func- 
tions equivalent to the constant division unit 12 and the 
constant combination unit 14. Therefore, the ordinary 
compiler does not have the functions equivalent to the 
other elements 910-917, 920, and 921. 
[0181] Therefore, when the assembler code genera- 
tion unit 91 1 generates the serial assembler code shown 
in Fig. 11 A, for instance, the serial assembler code is 
input Into the parallel scheduling unit 913 as rt is. As a 
result, the dependence graph generation unit 920 gen- 
erates the dependence graph 925 shown in Fig. 14. 
[0182] The instruction relocation unit 921 relocates 
the instructions shown in Fig. 11 A according to the de- 
pendence graph 925. As shown in Fig. 14, the instruc- 
tions 400 and 401 depend on each other and so cannot 
coexist (cannot construct a VLIW). The instruction 402 
is 24 bits long so that the instruction 402 cannot coexist 
with other instructions 400 and 401 . Therefore, the in- 
struction relocation unit 921 generates three VLIWs 
930-932 shown in Fig. 15. 

[0183] As can be seen by comparing Figs. 15 and 
lie. the code size of the parallel assembler code gen- 
erated by the ordinary compiler is greater than that of 
the emtxxiiment by one VLIW. Therefore, one more cy- 
cle is required for the execution of the code generated 
by the ordinary compiler. 

[0184] This is because the compiler of the embodi- 
ment divides the VLIW 932 in Fig. 15 Into small instruc- 
tions (one or more divided constant set instructions and 
a divided constant use instruction) and the small instruc- 
tions are arranged into the VLIWs 930 and 931 to fill the 
redundant areas in these VLIWs. 

<Example 2> 

[01 85] The following is a description of the operation 
of the linker unit 17 and the optimization process in the 
case where the serial assembler code 42 shown in Fig. 
1 7A is generated by the assembler code generation unit 
11 and is input into the constant division unit 12. 

(Linker Unit 17) 

[0186] Figs. 17A-17G show a series of specific code 
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and related information for the explanation of the oper- 
ation of the linker unit 17. 

[0187] Fig. 17A shows the serial assembler code 42 
generated by the assembler code generation unit 11 of 
Example 2. Fig. 17B shows code generated by the con- 
stant division unit 12 to which the serial assembler code 
42 Is Input. Fig, 17C shows parallel assembler code gen- 
erated by the parallel scheduling unit 13 to which the 
generated code is Input. Figs. 17D and 17E show the 
object code 44a and relocation information 45a gener- 
ated by the parallel assembler unit 16 to which the gen- 
erated parallel assembler code is Input. Figs. 17F and 
17G show the executable code 46 and the location in- 
formation 40 generated by the linker unit 1 7 to which the 
object code 44a and the relocation information 45a are 
input. 

[0188] Note that in this example, for the detailed ex- 
planation of the code generated by the assembler code 
generation unit 1 1 , the constant division unit 1 2, and the 
parallel scheduling unit 13. Figs. 17A-17C show addi- 
tional information indicated by codes following the sign 
where the additional information is generated to- 
gether with each instructk)n. This additional information 
includes an identifier for specifying each instruction and 
information related to divided constants. 
[0189] The additional information '©ID numeral* 
shown In Fig. 17A is the identifier of an Instruction (an 
instruction identifier) in the same row. In Fig. 17B, the 
additional information ■LbU12" and "LbL4" represent the 
upper 12 bits and the lower 4 bits of the label "L^beL". 
respectively, the additional information "816" (the size 
information) indicates that the label "LabeL" having 
been divided with its size assumed to be 16 bits, and 
the additional information 'M" indicates that the present 
instruction is the first one In the Instructions to store di- 
vided pieces of the label "LabeL" into the constant buffer 
107. 

[0190] The instruction "DS" 414 shown in Fug. 17A Is 
a dummy instruction for maintaining a storage area (4 
bytes) for storing the label "LabeL", 
[0191] The object code 44a shown In Fig 17Dandthe 
relocation information 45a shown in Fig. 17E are input 
into the linker unit 17. 

[01 92] I n Fig. 1 7D, the 'location information" in the ob- 
ject code indicates relative addresses of each Instruc- 
tion by offsets (in byte units) from the start of a specific 
memory area (a segment or a section). The sign "Ox" 
indicates that the number following the sign is ex- 
pressed In hexadecimal. The signs "LabeL: 12u" and 
"LabeL:4L* represent the upper 12 bits and the lower 4 
bits of the label "LabeL", respectively, the two pieces be- 
ing divided constants. 

[0193] As shown in Fig. 17E. the relocation informa- 
tion 45a is composed of the "label", the "location infor- 
mation" indicating the location of an instruction referring 
to the label, and the "additional information' accompa- 
nying the instruction. Here, the location Information is 
composed of an address of a VLI W and a numeral spec- 


ifying the location of a unit operation field in the VLIW 
including an Instruction referring to the label, where the 
VLIW address values differ from each other by 32 bits 
(4 bytes). 

s [0194] In this example, the label "LabeL' is referred 
to by an Instruction arranged in the third operation fietel 
of the VLIW located at the relative address "0x1 000" and 
this VLIW includes the additional information "ID102. 
LbU12. SI 6. M". The label "LabeL" is also referred to by 
10 the instruction arranged In the third operation field of the 
VLIW located at the relative address "0x1004" and this 
VLIW includes the additional information "ID102. Li>L4. 
SI 6". 

[0195] The following is a description of the operation 
of the linker unit 17 when receiving the object code 44a 
(shown in Fig. 17D) and the relocation information 45a 
(shown in Fig. 17E). 

[0196] In this example, the label address calculation 
unit 22 calculates that the final size of the label "L^beL" 
is 28 bits by referring to other simultaneously input ob- 
ject code. 

[0197] The Instruction insertion unit 23 sets the label 
"LabeL" as the target label (step 861 ) and extracts re- 
kx:ation Information of the target label "LabeL" from the 
relocation information 45a input Into the linker unit 17 
(step S62) . 

[0198] The instruction insertion unit 23 compares the 
size information "SIS" Included in the additional Infor- 
mation of the relocation information with the size "28 
bits" of the target label calculated by the label address 
calculatk>n unit 22 (step 863). 

[0199] In this case, the size calculated by the label 
address calculation unit 22 is greater than the size indi- 
cated by the size informatbn. Therefore, the instruction 
insertion unit 23 specifies one out of the relocation In- 
formation 429 and 430 for the label "LabeL" shown In 
Fig. 17E which includes the additional information "M" 
(in this example, the relocation information 429). Then 
the instruction Insertion unit 23 inserts a new VLIW in- 
cluding a no-operatlon code (nop) and one or more di- 
vided constant set instructions immediately before the 
VLIW 425 corresponding to the location information 
(0x1000. 3) (step 864). 

[0200] As shown in Fig. 17F, the resulting VLIW 431 
is additk^nally Inserted immediately before the VLIW 
432. Note that in this VLIW 431, the label "LabeL:12u" 
indicates the upper 12 bits of the 28-blt label "LabeL", 
that Is bits exceeding the 16 bits indicated by the size 
information of the relocation Information. Also, in the 
VLIW 432, the divided constant "LabeL: 12m" indicates 
middle 12 bits of the 2e-bit label "LabeL". 
[0201] In this manner, when the temporary label size 
which is assumed during compiling (the constant divi- 
sion by the constant divisk>n unit 1 2) is different from the 
final label size, an instruction is inserted to correct the 
difference. 

[0202] Finally, the output unit 24 generates the loca- 
tion information 40 Including the label size determined 
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by the label address calculation unit 22 and outputs the 
location information with the executable code 46 ob- 
tained after the instruction insertion unit 23 performs the 
instruction insertion process (Fig. 17F). 
[0203] The location information 436 of the label "La- 
beL' includes the label size "28" and the identifier 
■@ID102' of the instruction referring to the label. Here, 
when the optimization (described later) using this loca- 
tion information 40 is not performed, this output from the 
linker unit becomes the final executable code 46. 

(Optimization Using Location Information 40) 

[0204] The following is a description of the operation 
of the compiler when the location information 40 de- 
scribed above is fed back into the constant division unit 
12 and the process by the following units is repeated. 
[0205] Figs. 18A-18E show code and related infomia- 
tion generated by each component element when the 
generated location information 40 is fed back into the 
constant division unit 12. 

[0206] Fig. 1 8 A shows the code generated by the con- 
stant division unit 12 from the serial assembler code 42 
shown in Fig. 1 7A and the location information 40 shown 
in Fig. 17G. Fig. 18B shows parallel assembler code 
generated by the parallel scheduling unit 13 from the 
generated code. Figs. 18C and 18D respectively show 
the object code 44a and the relocation Information 45a 
generated by the parallel assembler unit 16 from the 
parallel assembler code. Fig. 18E shows the executable 
code 46 generated by the linker unit 17 from the object 
code 44a and the relocation information 45a. 
[0207] By referring to the input location information 
40, the constant division unit 1 2 determines that the size 
of the external label of the Instruction with the instruction 
identifier "ID102" is 28 bits (step S2 in Fig. 6), and di- 
vides the 28-bit label 'LabeL* (steps S3 and S4). As a 
result, the instruction 41 2 in Fig. 1 7A is replaced by three 
instructions 442-444 shown in Fig. ISA. The additional 
information "LbMl 2" of the instruction 443 indicates that 
this instruction refers to the middle 1 2 bits of the label 
"LabeL". 

[0208] The parallel scheduling unit 13 generates par- 
allel assembler code without no-operation codes (nop) 
(see Fig. 188) by generating a dependence graph and 
relocating instructions according to the procedure 
shown in Figs. 7 and 8. 

[0209] The parallel assembler unit 16 generates the 
object code 44a corresponding to the parallel assembler 
code (see Fig. 18C) and the relocation informatksn 45a 
(see Fig. 18D). Note that the legend 'LabeL. 12m' 
shown in Fig. 18C represents the middle 12 bits of the 
label "LabeL". 

[0210] In the linker unit 17, the label address calcula- 
tion unit 22 calculates that the size of the label "LabeL" 
is 28 bits again and so the instruction insertion unit 23 
does not insert any more instruction conceming the la- 
bel "LabeL" (step S63 in Fig. 10). Therefore, the execut- 


able code 46 shown in Fig. 18E is generated. 
[0211] As can be seen by comparing Figs. 18E and 
17F, the code size of the executable code optimized by 
sending back the location information 1 2 to the constant 

s divisk>n unit 1 2 (see Fig. 1 6E) is smaller than that of the 
other executable code (see Fig. 17F) by one VLIW. 
[0212] It should be noted here that the executable 
code generated in this manner (see Fig. 18E) can be 
transported to target environments equipped with the 

10 VLIW processors 100 by means of a recording medium, 
such as a floppy disk, a CD-ROM. or a semiconductor 
memory, or through communications via a transmission 
medium. 

IS <Example 3> 

[0213] The following is a description of the operation 
of the constant division unit 12 and the parallel sched- 
uling unit 13 in the case where serial assembler code 

20 42 including a branch instruction shown in Fig. 19A is 
generated by the assembler code generation unit 11 and 
is input into the constant division unit 1 2. 
[0214] Fig. 19A shows the serial assembler code 42 
generated by the assembler code generation unit 11 of 

2S Example 3. 

[021 5] The branch instructbn 473 in this figure is de- 
scribed below. 

{(Instruction 473) caLLJunc} 

30 

[0216] The execution control of the VLIW processor 
100 moves to the branch label ' June". 
[0217] In this example, the size of this branch label 
■_func" is 12 bits long and the argument R1 is trans- 
35 ferred to the function " _func" when control branches to 
the function. 

[0218] Fig. 198 shows the code generated by the con- 
stant division unit 1 2 from the serial assembler code 42 
shown in Fig. 1 9A. 

40 [0219] The branch instruction 473 is accompanied by 
the 12-bit branch label ' June" so that the instruction 
473 is divided into the divided constant set instruction 
477 for storing the branch label " _f unc" in the constant 
buffer 107 and the divided constant use instruction 478 

45 equivalent to the operation code "caLL" of the branch 
instruction 473. 

[0220] Fig. 1 9C shows a dependence graph generat- 
ed by the dependence graph generation unit 20 in the 
case where the code shown in Fig. 198 is input. 
so [0221] Because the argument R1 is used in the func- 
tion "June", the branch instructbn 478 depends on the 
instruction 474. 

[0222] Fig. 19D shows the outputable instruction 
group and the output schedule instruction group tempo- 
55 rarity generated by the instruction relocation unit 21 in 
the case where the code shown in Fig. 198 and the de- 
pendence graph shown in Fig. 1 9C are input. 
[0223] In the second scheduling cycle, the branch in- 
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struction (caLL) 478 is included in the output schedule 
instruction group. 

[0224] Fig. IQE shows the parallel assembler code 
generated by the instruction relocation unit 21. This fig- 
ure also shows the code in the first field 51 of VLIWs. 
[0225] In this manner, the constant division unit 12 
and the parallel scheduling unit 13 of the present com- 
piler generate the parallel assembler code of two VLIWs 
shown in Fig. 1 9E from the serial assembler code 42 
shown in Fig. 1 9A. 

[0226] The following is a description of the case where 
the same serial assembler code 42 is input into an ordi- 
nary compiler. 

[0227] Fig. 16 shows parallel assembler code which 
may be generated by the ordinary compiler. 
[0228] The ordinary compiler cannot divide the 
branch instruction 473 shown in Fig. 1 9 A so that at least 
a 13-bit field, namely successive two fields, is required 
for this instruction. Therefore, parallel assembler code 
for three VLIWs 940-942 is generated and there are 
many redundant areas in these VLIWs. 

<Example 4> 

[0229] The following description centers on the oper- 
ation of the constant combination unit 14 in the case 
where the serial assembler code 42 shown in Fig. 20A 
is generated by the assembler code generation unit 11 
and is input into the constant division unit 1 2. 
[0230] Fig. 20A shows the serial assembler code 42 
generated by the assembler code generation unit 11 of 
Example 4. 

[0231] Note that, in this example, while the size of the 
branch label M unc' used by the branch instruction 803 
is 12 bits as in Example 3, no argument is transferred 
to the function " _func" when control branches to the 
function. 

[0232] Fig. 20B shows the code generated by the con- 
stant division unit 12 from the serial assembler code 42 
shown in Fig. 20A. 

[0233] The branch instruction 503 is divided into the 
divided constant set instruction 507 cind the divided con- 
stant use instruction 508 like the Example 3. 
[0234] Fig. 20C shows a dependence graph generat- 
ed by the dependence graph generation unit 20 from the 
code shown in Fig. 20B. 

[0235] Fig. 20D shows the outputable instruction 
group and the output schedule instruction group tempo- 
rarily generated by the instruction relocation unit 21 from 
the code shown in Fig. 20B and the dependence graph 
shown in Fig. 20C. 

[0236] Unlike Example 3, in the second scheduling 
cycle, the divided constant set instruction 507 and the 
divided constant use instruction 508 are included in the 
output schedule instruction group. 
[0237] Fig. 20E shows the parallel assembler code 
generated by the instruction relocation unit 21. 
[0238] The parallel assembler code is composed of 


two VLIWs 509 and 510. The instructions 507 and 508 
generated from the branch instruction 503 are arranged 
in the first field 5 1 and the second field 52 of the VLIW 
510, respectively. 
s [0239] Fig. 20F shows the code generated by the con- 
stant combination unit 14 from the parallel assembler 
code shown in Fig. 20E. 

[0240] The constant combination unit 1 4 detects that 
the divided constant set instruction 507 and the divided 

10 constant use instruction 508 generated from the same 
long constant use instruction (the branch instruction 
503) are arranged in the same VLIW 510. Accordingly, 
the constant combination unit 14 replaces these instruc- 
tions 507 and 508 with a bng constant use instruction 

IS (an instruction of the same format as the original branch 
instruction 503) obtained by combining the instructions 
507 and 508. This solves problems caused by the con- 
stant division unit 12 unnecessarily dividing a constant 
(since divided constants need not have arranged the di- 

20 vided constants in a plurality of VLIWs). 

[0241] The target processor of the compiler of the em- 
bodiment is similar to the VLIW processor disclosed by 
Japanese Laid-Open Patent Application H9-1 59058 or 
H9-1 59059. The present compiler may be used for any 

2S constant reconstruction processor executing a program 
which is made by dividing instructions into parts and ar- 
ranging the divided instructions parts in a plurality of VLI- 
Ws. 

[0242] While the compiler of the embodiment gener- 
ic ates VLIWs in two formats shown in Figs. 2A and 2B, 
this compiler may generates VLIWs in any type of for- 
mat, such as VLIWs which each are composed of three 
16-bit operation fields. This is because the present in- 
ventk>n is a technique for dividing constants included in 
35 instructions and performing parallel scheduling accord- 
ing to the size of operation fields of VLIWs. 
[0243] The VLIW processor 100, which is the target 
processor of the compiler of the embodiment, includes 
a 32 -bit shift register (the constant buffer 107). The shift 
40 register is filled with Os immediately after a value stored 
in the shift register is referred to. However, the present 
invention is not limited to the processor including the 
constant buffer 107 functbning like this. The present in- 
ventkxi may be used for a processor including a con- 
45 stant buffer for storing two or more independent con- 
stants and using instructions that explicitly indicate stor- 
age areas of the constants and clears the used content, 
f^ore specifically, when a divided constant set instruc- 
tion is generated, an instruction for indicating the stor- 
50 age area may also be generated. And when a divided 
constant use instruction is generated, an instruction for 
clearing the content may also be generated. 
[0244] In the embodiment, fixed values such as 4 bits 
or 12 bits are used when constants are divided. Howev- 
55 er, the present invention is not limited to these values. 
[0245] In the embodiment, when the size of a label 
cannot be determined, the size Is assumed to be the 
most common address size (16 bits). However, the size 
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may be assumed to be the maximum address size of 
the target processor. Also, when a constant whose size 
cannot be determined is an operand for a transfer/arith- 
metic logic instruction, the size of the constant may be 
assumed to be the niaximum constant size of the target 
processor or to be the most common constant size. The 
assumed size may be pre-stored as a default value or 
may be specified by a user as an Qption when the com- 
piler is activated. 

[Q246] As can be understood from the specific de- 
scription for step S4 in Fig. 6. the function of the constant 
division unit 12 can be expressed from two points of 
view. That is, from the first point of view, the constant 
division unit 1 2 functions as a means for dividing each 
instruction including a constant among input instructions 
(each long constant use instruction) into a plurality of 
instructions (one or more divided constant set instruc- 
tions and a divided constant use instruction) as shown 
in Fig. 21 A. From the second point of view, the constant 
division unit 1 2 functions as a means for dividing each 
constant in input instructions (each 28-bit long constant) 
into a plurality of parts and for generating a plurality of 
instructions respectively including the divided constant 
parts (a divided constant set instruction for the upper 
12-bit divided constant, a divided constant set instruc- 
tion for the middle 1 2-bit divided constant, and a divided 
constant use instruction including the lower 4-bit short 
constant) according to the input instructions, as shown 
in Fig. 21 B. 

[0247] In Example 1 , the instruction relocation unit 21 
adds the instruction 404 to the instruction 406 remaining 
in the outputable instruction group in the first scheduling 
cycle to proceed to the second cycle. However, the in- 
struction relocation unit 21 may clear the content of the 
outputable instruction group and recalculate for each cy- 
cle. 

[0248] It should be noted here that executable code 
generated by the compiler of the embodiment can be 
transported to a target environment which executes the 
generated code by means of a recording medium, such 
as a floppy disk, a CD-ROM, or a semiconductor mem- 
ory, or through communications via a transmission me- 
dium. 

[0249] Fig. 22 shows a simplified content of the CD- 
ROM 200 recording the VLIW sequence 201 shown in 
Fig. 18E and the VLIW sequence 202 shown in Fig. 9E 
generated by the compiler of the emt)odiment. In the 
VLIW sequence 201, the two VLIWs 458 and 459 in- 
clude constants to be combined and stored in the stor- 
age buffer of the processor implicitly indicated by the 
VLIW sequence. The VLIW 460 follows both the VLIWs 
458 and 459 and is the first VLIW to refer to the storage 
buffer. This VLIW 460 includes a constant and the in- 
struction (Ld) for using a constant obtained by combin- 
ing the constant and the constants included in the two 
or more VLIWs. In the VLIW sequence 202. the VLIW 
480 includes the constant (sfst) to be stored into the stor- 
age buffer of the processor implicitly indicated by the 


VLIW sequence. The VLIW 481 , which follows the VLIW 
480 and is the first VLIW to refer to the storage buffer, 
includes the instruction (caLL) for using the constant 
(_func) stored in the storage buffer. 
5 [0250] The compiler itself of the present invention 
may also be stored in a recording medium, such as a 
floppy disk, a CD-ROM, or a semiconductor memory, 
and be distributed, like executable code obtained by the 
compiler. 


Claims 


1. A recording medium recording a program for con- 
15 verting an instruction sequence comp>osed of seri- 
ally arranged instructions into a VLIW (Very Long 
Instruction Word) sequence for a processor, the 
program comprising: 

20 a division step for dividing each instruction in- 

cluding a constant in the instruction sequence 
into a plurality of divided instructions; 
an analysis step for analyzing dependence re- 
lations between each instruction in the instruc- 
ts tion sequence Including divided instructions 
generated in the division step according to an 
execution order of each instruction in the in- 
struction sequence; and 
a relocation step for relocating instructions in 
30 the instruction sequence in compliance with the 
analyzed dependence relations to generate 
VLIWs which are each composed of a plurality 
of instructions that are executable In parallel. 

35 2. The recording medium of Claim 1 , 

wherein the division step includes: 
an instruction size judgement substep for per- 
forming an instruction size judgement as to 

40 whether a size of an instruction including a con- 

stant is equal to or smaller than a size of each 
unit operation field in a VLIW; and 
a division substep which, when the size of the 
instruction including the constant is judged to 

45 be greater than the size of each unit operation 

field, divides the instruction including the con- 
stant into a plurality of divided instructions 
whose sizes are each equal to or smaller than 
the size of each unit operation field. 

so 

3. The recording medium of Claim 2, 

wherein in the division substep, the instruction 
including the constant is divided into one or more 
instructions for storing the constant into a storage 
55 buffer of the processor and an instruction for using 
the stored constant. 

4. The recording medium of Claim 3, 
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wherein the program further comprises a 
combination step which, when two or more divided 
instructions generated from a same instruction in- 
cluding a constant in the division substep are ar- 
ranged in a same VLIW in the relocation step, com- s 
bines the two or more divided instructions into one 
instruction. 

5. The recording medium of Claim 4, 

TO 

wherein in the instruction size judgement sub- 
step, the instruction size judgement is per- 
formed using an assumed size for the constant, 
and 

the program further comprises: is 
a constant size determination step for linking a 
plurality of VLIW sequences and determining a 
final size of each constant; and 
an insertion step which, when the final size is 
greater than the assumed size, generates an 20 
instruction for storing into the storage buffer a 
divided constant corresponding to a difference 
between the final size and the assumed size 
and inserting the generated instruction into a 
corresponding VLIW sequence. 2S 

6. The recording medium of Claim 5. 

wherein in the instruction size judgement sub- 
step, the assumed size is set to a maximum con- 
stant size manageable by the processor. 

7. The recording medium of Claim 6. 

wherein the program further comprises a step 
for re-executing the division step after the con- 
stant size determination step, 
wherein in the instruction size judgement sub- 
step in the re-executed division step, the in- 
struction size judgement is performed in con- 
sideration of the final size determined in the 40 
constant size determination step. 


8. The recording medium of Claim 7. 

wherein the program further comprises a step 
for re-executing the analysis step and the relocation 
step following the re-executed division step. 

9. The recording medium of Claim 5, 

wherein in the instruction size judgement sub- 
step, the assumed size of the constant is set to a 
most commonly used constant size. 

10. The recording medium of Claim 5. 

wherein the program further comprises a step 
for re-executing the division step after the con- 
stant size determination step, 
wherein in the instruction size judgement sub- 


step in the re-executed division step, the in- 
struction size judgement is performed in con- 
sideration of the final size determined in the 
constant size determination step. 

11. The recording medium of Claim 2, 
wherein in the division substep, the instruction 

including the constant is divided into one or more 
instructions for respectively storing one or more di- 
vided constants into a storage buffer of the proces- 
sor and an instruction for using the stored divided 
constants, wherein the divided constants are ob- 
tained by dividing the constant. 

12. The recording medium of Claim 11. 
wherein the program further comprises a 

combination step which, when two or more divided 
instructions generated from a same instruction in- 
cluding a constant in the division substep are ar- 
ranged in a same VLIW in the relocation step, com- 
bines the two or more divided instructions into one 
instruction. 

13. The recording medium of Claim 12. 

wherein in the instruction size judgement sub- 
step, the instruction size judgement is per- 
formed using an assumed size for the constant, 
and 

the program further comprises: 
a constant size determination step for linking a 
plurality of VLIW sequences and determining a 
final size of each constant; and 
an insertion step which, when the final size is 
greater than the assumed size, generates an 
Instruction for storing into the storage buffer a 
divided constant corresponding to a difference 
between the final size and the assumed size 
and inserting the generated instruction into a 
corresponding VLIW sequence. 

14. The recording medium of Claim 13. 
wherein in the instruction size judgement sub- 
step, the assumed size is set to a maximum con- 

4S stant size manageable by the processor. 

15. The recording medium of Claim 14, 

wherein the program further comprises a step 
so tor re-executing the division step after the con- 

stant size determination step, 
wherein in the instruction size judgement sub- 
step in the re-executed division step, the in- 
struction size judgement is performed in con- 
ss side ration of the final size determined in the 

constant size determination step. 

16. The recording medium of Claim 15. 


so 
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wherein in the division step, an instruction for 
storing a divided constant obtained in the division 
substep into a storage buffer of the processor and 
an instruction for using the stored divided constant 
s are generated. 

22. The recording medium of Claim 21 , 

wherein the program further comprises a 
combination step which, when two or more instruc- 
10 tions generated in the division step are arranged in 
a same VLIW in the relocation step, combines the 
two or more instructions. 

23. The recording medium of Claim 22, 

IS 

wherein in the instruction size judgement sub- 
step, the instruction size judgement is per- 
formed using an assumed size for the constant, 
and 

20 the program further comprises: 

a constant size determination step for linking a 
plurality of VLIW sequences and determining a 
final size of the constant; and 
an insertion step which, when the final size is 

2S greater than the assumed size, generates an 

instruction for storing into the storage buffer a 
divided constant corresponding to a difference 
between the final size and the assumed size 
and inserting the generated instruction into a 

30 corresponding VLIW sequence. 


wherein the program further comprises a step 
for re-executing the analysis step and the relocation 
step following the re-executed division step. 

17. The recording medium of Claim 13, 

wherein in the instruction size judgement sub- 
step, the assumed size of the constant is set to a 
most conrunonly used constant size. 

18. The recording medium of Claim 13, 

wherein the program further comprises a step 
for re-executing the division step after the con- 
stant size determination step, 
wherein in the instruction size judgement sub- 
step in the re-executed division step, the in- 
struction size judgement is performed in con- 
sideration of the final size determined in the 
constant size determination step. 

19. A recording medium recording a program for con- 
verting an instruction sequence composed of seri- 
ally arranged instructions into a VLIW sequence for 
a processor, the program comprising: 

a division step for dividing each constant in the 
instruction sequence into a plurality of divided 
constants and generating a plurality of instruc- 
tions which each include one of the plurality of 
divided constants; 

an analysis step for analyzing dependence re- 
lations between each instruction in the instruc- 
tion sequence including the generated plurality 
of instructions according to an execution order 
of each instruction in the instruction sequence; 
and 

a relocation step for relocating instructions in 
the instruction sequence in compliance with the 
analyzed dependence relations to generate 
VLIWs which are each composed of a plurality 
of instructions that are executable in parallel. 

20. The recording medium of Claim 19. 

wherein the division step includes: 
an instruction size judgement substep for per- 
forming an instruction size judgement as to 
whether a size of the constant is equal to or 
smaller than a size of each unit operation field 
in a VLIW; and 

a division substep which, when the size of the 
constant is judged to be greater than the size 
of each unit operation field, divides the constant 
into a plurality of divided constants whose sizes 
are equal to or smaller than the size of each 
unit operation field. 

21. The recording medium of Claim 20, 


24. The recording medium of Claim 23. 

wherein in the instruction size judgement sub- 
step, when the final size has not been determined, 
3S the assumed size is set to a maximum constant size 
manageable by the processor. 

25. The recording medium of Claim 24, 

40 wherein the program further comprises a step 

for re-executing the division step after the con- 
stant size determination step, 
wherein in the instruction size judgement sub- 
step in the re-executed division step, the in- 
45 struction size judgement is performed in con- 

sideration of the final size determined in the 
constant size determination step. 

26. The recording medium of Claim 25, 

so wherein the program further comprises a step 

for re-executing the analysis step and the relocation 
step following the re-executed division step. 

27. The recording medium of Claim 23, 

55 wherein in the instruction size judgement sub- 

step, when the final size has not been determined, 
the assumed size of the constant is set to be a most 
commonly used constant size. 
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28. The recording medium of Claim 23, 

wherein the program further comprises a step 
for re-executing the division step after the con- 
stant size determination step, 
wherein in the instruction size judgement sub- 
step in the re-executed division step, the in- 
struction size judgement is performed in con- 
sideration of the final size determined in the 
constant size determination step. 

29. A recording medium recording a VLI W sequence for 
a processor which executes a plurality of Instruc- 
tions in parallel. 

wherein a first VLI W in the VLIW sequence in- 
cludes a constant to be stored into a storage 
buffer of the processor implicitly indicated by at 
least one VLIW in the VLIW sequence, and 
a second VLIW, which follows the first VLIW 
and is a first to refer to the storage buffer after 
the first VLIW, .includes an instruction for using 
the constant in the storage buffer. 

30. The recording medium of Claim 29, 


after the two or more VLIWs, includes an in- 
struction for using the combined constant in the 
storage buffer. 

s 33. A recording medium recording a VLIW sequence for 
a processor which executes a plurality of instruc- 
tions in parallel, 

wherein the VLIW sequence includes two or 
10 more VLIWs which each include a constant to 

be stored and combined in a storage buffer of 
the processor implicitly indicated by at least 
one VLIW in the VLIW sequence, and 
a second VLIW, which follows the two or more 
IS VLIWs and is a first to refer to the storage buffer 

after the two or more VLIWs, includes a con- 
stant and an instruction for using a constant ob- 
tained by combining the constant included in 
the second VLIW and the constants in the stor- 
20 age buffer. 

34. An apparatus for converting an instruction se- 
quence composed of serially arranged tnstructbns 
into a VLIW sequence for a processor, the appara- 
2S tus comprising: 


20 


a division means for dividing each instruction 
including a constant in the instruction sequence 
into a plurality of divided instructions; 

30 an analysis means for analyzing dependence 

relations between each instruction in the in- 
struction sequence including divided Instruc- 
tions generated in the division step according 
to an execution order of each instruction in the 

35 instruction sequence; and 

a relocation means for relocating instructions in 
the instruction sequence in compliance with the 
analyzed dependence relations to generate 
VLIWs which are each composed of a plurality 

^0 of instructions that are executable in parallel. 


wherein the constant included in the first VLIW 
is a branch address, and 
the instruction included In the second VLIW is 
a branch instruction that does not include a 
branch address. 

31 . A recording medium recording a VLIW sequence for 
a processor which executes a plurality of instruc- 
tions in parallel, 

wherein a first VLIW in the VLIW sequence in- 
cludes a constant to be stored into a storage 
buffer of the processor implicitly indicated by at 
least one VLIW in the VLIW sequence, and 
a second VLIW, which follows the first VLIW 
and is a first to refer to the storage buffer after 
the first VLIW, includes a constant and an in- 
struction for using a constant obtained by com- 
bining the constant included in the second 
VLIW and the constant in the storage buffer. 

32. A recording medium recording a VLIW sequence for 
a processor which executes a plurality of instruc- 
tions in parallel. 

wherein the VLIW sequence includes two or 
more VLIWs which each include a constant to 
be stored and combined in a storage buffer of 
the processor implicitly indicated by at least 
one VLIW in the VLIW sequence, and 
another VLIW, which follows the two or more 
VLIWs and is a first to refer to the storage buffer 


35. An apparatus for converting an instruction se- 
quence composed of serially arranged instructions 
into a VLIW sequence for a processor, the appara- 
45 tus comprising: 


a division means for dividing each constant in 
the instruction sequence into a plurality of di- 
vided constants and generating a plurality of in- 

50 structions which each include one of the plural- 

ity of divided constants; 
an analysis means for analyzing dependence 
relations between each instruction in the in- 
struction sequence including the generated 

55 plurality of instructions according to an execu- 

tion order of each instruction in the instruction 
sequence; and 

a relocation means for relocating instructions in 
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the instruction sequence in compliance with the 
analyzed dependence relations to generate 
VUWs which are each composed of a plurality 
of instructions that are executable in parallel. 
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VLIW 

511 : 

512 1 

no( 
cat 

) ; mov RO, R2 ; mov Rl, R3 
X.func :addR3.R0 
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EP 0 899 656 A2 


s 

Fig. 21 A Ld (LabeL) .R3 

sfst LabeL:12u 
sfst LabeL:12m 


Ld (LabeL:4L).R3 


S 

Fig. 2 IB Ld (LabeL) .R3 



LabeL: 1 2u^sfst LabeL: 12u 
LabeL:12m-*sfst LabeL: 12m 
LabeL:4L->Ld (LabeL:4L).R3 


40 


EP 0 899 656 A2 


Fig. 22 


480 
481 


.nop;mov RO.Rl;sfst_func 
•caLL:mov Rl ,R2;mov Rl .R3 



mov RO.Rl ;sfst LabeL; 12u ^ 
addRl.R2;sfstLabeL:12m ^ 
add R2.R4:Ld (LabeL:4L).R3 
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.459 
,460 
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